MPI collectives at scale

نویسندگان

  • Christoph Niethammer
  • Pekka Manninen
  • Rupert W. Nash
  • Dmitry Khabi
  • Jose Gracia
چکیده

Collective operations improve the performance and reduce code complexity of many applications parallelized with the messagepassing interface (MPI) paradigm. In this article, we will investigate the impact of load imbalance on the performance of collective operations and possibility for hiding parallel overhead caused by a collective communication pattern, by overlapping the communication with computation. Finally, we will present the use case of non-blocking collectives in a real-world application, namely established latticeBoltzmann fluids solver, HemeLB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Optimized Barrier Algorithms for SCI Networks with Different MPI Implementations

The SCI Collectives Library is a new software package which implements optimized collective communication operations on SCI networks. It is designed to be coupled to different higher-level communication libraries (especially MPI implementations) by adapter modules, thereby giving them access to these optimized collectives. In this work, we present the design of the SCI Collectives Library and o...

متن کامل

Tuning MPI Collectives by Verifying Performance Guidelines

ABSTRACT MPI collective operations provide a standardized interface for performing data movements within a group of processes. The e ciency of collective communication operations depends on the actual algorithm, its implementation, and the speci c communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for speci c collecti...

متن کامل

Efficient RDMA-based Multi-port Collectives on Multi-rail QsNet Clusters

Many scientific applications use MPI collective communications intensively. Therefore, efficient and scalable implementation of collective operations is critical to the performance of such applications running on clusters. Quadrics QsNet is a high-performance interconnect for clusters that implements some collectives at the Elan level. These collectives are directly used by their corresponding ...

متن کامل

Optimizing MPI Collectives for X1

Traditionally MPI collective operations have been based on point-to-point messages, with possible optimizations for system topologies and communication protocols. The Cray X1 scatter/gather hardware and shared memory mapping features allow for significantly different approaches to MPI collectives leading to substantial performance gains over standard methods, especially for short message length...

متن کامل

Hierarchical Collectives in MPICH2

Most parallel systems on which MPI is used are now hierarchical: some processors are much closer to others in terms of interconnect performance. One of the most common such examples are systems whose nodes are symmetric multiprocessors (including “multicore” processors). A number of papers have developed algorithms and implementations that exploit shared memory on such nodes to provide optimize...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014